Extract filenames and ext. from cell array of strings - octave

I want to remove the directory part from a cell array of strings with filenames. Of course one way would be to loop over the cell arrray and use fileparts but I have over 1e5 files and speed really matters.
My current approach is:
fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"}
filenames = cellfun (#(fn, s) fn(s+1:end), fns,
num2cell (rindex (fns, filesep())),
"UniformOutput", false)
which gives the desired output:
fns =
{
[1,1] = /usr/local/foo.lib
[1,2] = ~/baz.m
[1,3] = home/rms/eula.txt
[1,4] = bar.m
}
filenames =
{
[1,1] = foo.lib
[1,2] = baz.m
[1,3] = eula.txt
[1,4] = bar.m
}
and takes approx 2e-5s per file. Is there a better (faster, more readable) way to do this?
EDIT I've added Sardars solution and my previous attempt with regex and some benchmark results:
fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"};
fns = repmat (fns, 1, 1e4);
tic
f1 = cellfun (#(fn, s) fn(s+1:end), fns,
num2cell (rindex (fns, "/")),
"UniformOutput", false);
toc
tic
[~, ~, ~, M] = regexp (fns, "[^\/]+$", "lineanchors");
f2 = cell2mat (M);
toc
tic
## Asnwer from Sardar Usama
f3 = regexprep(fns, '.*/', '');
toc
assert (f1, f2)
assert (f1, f3)
which gives
Elapsed time is 0.729995 seconds. (Original code with cellfun)
Elapsed time is 0.67545 seconds. (using regexp)
Elapsed time is 0.230487 seconds. (using regexprep)

Use regexprep to search the strings till the last / and replace the occurrences with an empty string.
filenames = regexprep(fns, '.*/', '');

Related

Why are the values ​of the expression not displayed in numeric format?

I have an expression (signal) and I want to get certain results using a loop.
Original formula
Code:
Ak=1.2;
fk=0.0123;
jK=0.321;
alphaK=-0.01;
T=1;
#Signal
#xi=1.2*cos(0,0123*2*pi*(i-1)+0.321)*eps(–0.01*i);
for i=1:4
x(i)=Ak*cos(fk*2*pi*(i-1)+jK)*eps(alphaK*i);
end
The results are in this format
x = 1.9753e-18
x = 1.9753e-18 3.8375e-18
x = 1.9753e-18 3.8375e-18 3.7013e-18
x = 1.9753e-18 3.8375e-18 3.7013e-18 7.0863e-18
But the correct results look like this:
How do I change the format to get the correct display?
replace
x(i)=Ak*cos(fk*2*pi*(i-1)+jK)*eps(alphaK*i);
with
x(i)=Ak*cos(fk*2*pi*(i-1)+jK)*exp(alphaK*i);
and you will have a result much near to the expected one
x =
1.12737 1.08417 1.03531 0.98119
eps is NOT the exponential function
octave:3> eps
ans = 2.2204e-16
octave:4> eps(x)
ans =
2.2204e-16 2.2204e-16 2.2204e-16 1.1102e-16
octave:5> exp(x)
ans =
3.0875 2.9570 2.8160 2.6676

How to handle outliers in this octave boxplot for improved readability

Outliers between 1,5 - 3 times the interquantile range is marked with an "+" and above 3 times the IQR with an "o". But due to this data set with multiple outliers the below boxplot is very hard to read since the "+" and "o" symbols are plotted on top of each other creating what appears to be a thick red line.
I need to plot all data so removing them is not an option but I would be fine to display "longer" boxes, i.e. stretch the q1 and q4 to reach the true min/max values and skip the "+" and "o" outlier symbols. I would also be fine if just the min and max outliers was displayed.
I'm totally in the dark here and the octave boxplot documentation found here did not include any helpful examples on how to handle outliers. A search here at stackoverflow didn't get me closer to a solution either. So any help or directions is very appreciated!
How can I modify the below code to create a boxplot based on the same data set that is readable (i.e. doesn't plot outliers on top of each other creating a thick red line)?
I'm using Octave 4.2.1 64-bits on a Windows 10 machine with qt as the graphics_toolkit and with GDAL_TRANSLATE called from within Octave to handle the tif-files.
It's not an option to switch graphics_toolkit to gnuplot etc. since I haven't been able to "rotate" the plot (horizontal boxes instead of vertical). And it's in the .pdf file the results must have an effect, not only in octaves viewer.
Please forgive my totally "newbie-style" coding-work-around to get a proper high resolution pdf-exported:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
I would just delete the outliers. This is easy because the handles are returned. I've also included some caching algorithm so you don't have to reload all tifs if you are playing with plots. Splitting the conversion, processing and plotting in different scripts is always a good idea (but not for stackoverflow where minimalistic examples are prefered). Here we go:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
gives

Object of type 'closure' is not subsettable - R

I am using R to extract tweets and analyse their sentiment, however when I get to the lines below I get an error saying "Object of type 'closure' is not subsettable"
scores$drink = factor(rep(c("east"), nd))
scores$very.pos = as.numeric(scores$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
Full code pasted below
load("twitCred.Rdata")
east_tweets <- filterStream("tweetselnd.json", locations = c(-0.10444, 51.408699, 0.33403, 51.64661),timeout = 120, oauth = twitCred)
tweets.df <- parseTweets("tweetselnd.json", verbose = FALSE)
##function score.sentiment
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
# Parameters
# sentences: vector of text to score
# pos.words: vector of words of postive sentiment
# neg.words: vector of words of negative sentiment
# .progress: passed to laply() to control of progress bar
scores = laply(sentences,
function(sentence, pos.words, neg.words)
{
# remove punctuation
sentence = gsub("[[:punct:]]", "", sentence)
# remove control characters
sentence = gsub("[[:cntrl:]]", "", sentence)
# remove digits?
sentence = gsub('\\d+', '', sentence)
# define error handling function when trying tolower
tryTolower = function(x)
{
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# use tryTolower with sapply
sentence = sapply(sentence, tryTolower)
# split sentence into words with str_split (stringr package)
word.list = str_split(sentence, "\\s+")
words = unlist(word.list)
# compare words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# get the position of the matched term or NA
# we just want a TRUE/FALSE
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# final score
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress )
# data frame with scores for each sentence
scores.df = data.frame(text=sentences, score=scores)
return(scores.df)
}
pos = readLines(file.choose())
neg = readLines(file.choose())
east_text = sapply(east_tweets, function(x) x$getText())
scores = score.sentiment(tweetseldn.json, pos, neg, .progress='text')
scores()$drink = factor(rep(c("east"), nd))
scores()$very.pos = as.numeric(scores()$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
# how many very positives and very negatives
numpos = sum(scores$very.pos)
numneg = sum(scores$very.neg)
# global score
global_score = round( 100 * numpos / (numpos + numneg) )
If anyone could help with as to why I'm getting this error it will be much appreciated. Also I've seen other answeres about adding '()' when referring to the variable 'scores' such as scores()$.... but it hasn't worked for me. Thank you.
The changes below got rid of the error:
x <- scores
x$drink = factor(rep(c("east"), nd))
x$very.pos = as.numeric(x$score >= 2)
x$very.neg = as.numeric(x$score <= -2)

Display struct fields without the mess

I have a struct in Octave that contains some big arrays.
I'd like to know the names of the fields in this struct without having to look at all these big arrays.
For instance, if I have:
x.a=1;
x.b=rand(3);
x.c=1;
The obvious way to take a gander at the structure is as follows:
octave:12> x
x =
scalar structure containing the fields:
a = 1
b =
0.7195967 0.9026158 0.8946427
0.4647287 0.9561791 0.5932929
0.3013618 0.2243270 0.5308220
c = 1
In Matlab, this would appear as the more succinct:
>> x
x =
a: 1
b: [3x3 double]
c: 1
How can I see the fields/field names without seeing all these big arrays?
Is there a way to display a succinct overview (like Matlab's) inside Octave?
Thanks!
You might want to take a look at Basic Usage & Examples. There's several functions mentioned that sound like they'll control the displaying in the terminal.
struct_levels_to_print
print_struct_array_contents
These two functions sound like they're do what you want. I tried both and couldn't get the 2nd one to work. The 1st function changed the terminal output like so:
octave:1> x.a=1;
octave:2> x.b=rand(3);
octave:3> x.c=1;
octave:4> struct_levels_to_print
ans = 2
octave:5> x
x =
{
a = 1
b =
0.153420 0.587895 0.290646
0.050167 0.381663 0.330054
0.026161 0.036876 0.818034
c = 1
}
octave:6> struct_levels_to_print(0)
octave:7> x
x =
{
1x1 struct array containing the fields:
a: 1x1 scalar
b: 3x3 matrix
c: 1x1 scalar
}
I'm running a older version of Octave.
octave:8> version
ans = 3.2.4
If I get a chance I'll check that other function, print_struct_array_contents, to see if it does what you want. Octave 3.6.2 looks to be the latest version as of 11/2012.
Use fieldnames ()
octave:33> x.a = 1;
octave:34> x.b = rand(3);
octave:35> x.c = 1;
octave:36> fieldnames (x)
ans =
{
[1,1] = a
[2,1] = b
[3,1] = c
}
Or you you want it to be recursive, add the following to your .octaverc file (you may want to adjust it to your preferences)
function displayfields (x, indent = "")
if (isempty (indent))
printf ("%s: ", inputname (1))
endif
if (isstruct (x))
printf ("structure containing the fields:\n");
indent = [indent " "];
nn = fieldnames (x);
for ii = 1:numel(nn)
if (isstruct (x.(nn{ii})))
printf ("%s %s: ", indent, nn{ii});
displayfields (x.(nn{ii}), indent)
else
printf ("%s %s\n", indent, nn{ii})
endif
endfor
else
display ("not a structure");
endif
endfunction
You can then use it in the following way:
octave> x.a=1;
octave> x.b=rand(3);
octave> x.c.stuff = {2, 3, 45};
octave> x.c.stuff2 = {"some", "other"};
octave> x.d=1;
octave> displayfields (x)
x: structure containing the fields:
a
b
c: structure containing the fields:
stuff
stuff2
d
in Octave, version 4.0.0 configured for "x86_64-pc-linux-gnu".(Ubuntu 16.04)
I did this on the command line:
print_struct_array_contents(true)
sampleFactorList % example, my nested structure array
Output: (shortened):
sampleFactorList =
scalar structure containing the fields:
sampleFactorList =
1x6 struct array containing the fields:
var =
{
[1,1] = 1
[1,2] =
2 1 3
}
card =
{
[1,1] = 3
[1,2] =
3 3 3
}
To disable/get back to the old behaviour
print_struct_array_contents(false)
sampleFactorList
sampleFactorList =
scalar structure containing the fields:
sampleFactorList =
1x6 struct array containing the fields:
var
card
val
I've put this print_struct_array_contents(true) also into the .octaverc file.

LZW Compression In Lua

Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...