Object of type 'closure' is not subsettable - R - json

I am using R to extract tweets and analyse their sentiment, however when I get to the lines below I get an error saying "Object of type 'closure' is not subsettable"
scores$drink = factor(rep(c("east"), nd))
scores$very.pos = as.numeric(scores$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
Full code pasted below
load("twitCred.Rdata")
east_tweets <- filterStream("tweetselnd.json", locations = c(-0.10444, 51.408699, 0.33403, 51.64661),timeout = 120, oauth = twitCred)
tweets.df <- parseTweets("tweetselnd.json", verbose = FALSE)
##function score.sentiment
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
# Parameters
# sentences: vector of text to score
# pos.words: vector of words of postive sentiment
# neg.words: vector of words of negative sentiment
# .progress: passed to laply() to control of progress bar
scores = laply(sentences,
function(sentence, pos.words, neg.words)
{
# remove punctuation
sentence = gsub("[[:punct:]]", "", sentence)
# remove control characters
sentence = gsub("[[:cntrl:]]", "", sentence)
# remove digits?
sentence = gsub('\\d+', '', sentence)
# define error handling function when trying tolower
tryTolower = function(x)
{
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# use tryTolower with sapply
sentence = sapply(sentence, tryTolower)
# split sentence into words with str_split (stringr package)
word.list = str_split(sentence, "\\s+")
words = unlist(word.list)
# compare words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# get the position of the matched term or NA
# we just want a TRUE/FALSE
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# final score
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress )
# data frame with scores for each sentence
scores.df = data.frame(text=sentences, score=scores)
return(scores.df)
}
pos = readLines(file.choose())
neg = readLines(file.choose())
east_text = sapply(east_tweets, function(x) x$getText())
scores = score.sentiment(tweetseldn.json, pos, neg, .progress='text')
scores()$drink = factor(rep(c("east"), nd))
scores()$very.pos = as.numeric(scores()$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
# how many very positives and very negatives
numpos = sum(scores$very.pos)
numneg = sum(scores$very.neg)
# global score
global_score = round( 100 * numpos / (numpos + numneg) )
If anyone could help with as to why I'm getting this error it will be much appreciated. Also I've seen other answeres about adding '()' when referring to the variable 'scores' such as scores()$.... but it hasn't worked for me. Thank you.

The changes below got rid of the error:
x <- scores
x$drink = factor(rep(c("east"), nd))
x$very.pos = as.numeric(x$score >= 2)
x$very.neg = as.numeric(x$score <= -2)

Related

mlogit error: (Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, : variable lengths differ (found for 'OSTAN1'))

I have collected data from a survey in order to perform a choice based conjoint analysis. I have preprocessed and clean data. However, when I apply the function mlogit on the dataset I get the following error:
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, : variable lengths differ (found for 'OSTAN1')
while I omitted 'OSTAN1'variable, the same error happened for the next variable and so on.
I really do not understand why. Can please some one help me to understand what I am doing wrong ?
library("mlogit")
data("DATA3", package = "mlogit")
Electr <- dfidx(DATA3, idx = list(c("chid", "ID")),
choice = "NOE_TASADOF", varying = 2:10, sep = "")
Elec.mxl <- mlogit(NOE_TASADOF ~ TIME + WEEKEND + NOR + HAVA + GHANON +
KHASTEGI + GOVAHINAME + TAHSILAT+ OSTAN1 + OSTAN2+ OSTAN3+ AGE1+ AGE2+AGE3+ FASL1+
FASL2+FASL3+FASL4|0,
rpar=c(OSTAN1 = 'n', OSTAN2 = 'n',
OSTAN3='n', AGE1='n', AGE2='n',
AGE3='n', TIME='n',WEEKEND='n',
NOR='n',HAVA='n', GHANON='n',KHASTEGI='n',
GOVAHINAME='n', TAHSILAT='n') ,R = 100, correlation = FALSE,halton = NA, Electr , panel = TRUE)

plotTuneMultiCritResult does not work with TuneMultiCritControlMBO

I am trying to plot the Pareto front of a TuneMultiCritResult object, tuned with a control object of class TuneMultiCritControlMBO:
# multi-criteria optimization of (tpr, fpr) with MBO
lrn = makeLearner("classif.ksvm")
rdesc = makeResampleDesc("Holdout")
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
ctrl = makeTuneMultiCritControlMBO()
res = tuneParamsMultiCrit(lrn, sonar.task, rdesc, par.set = ps,
measures = list(tpr, fpr), control = ctrl)
Printing the object res gives the following:
> res
Tune multicrit result:
Points on front: 14
> res$ind
[1] 1 2 4 5 6 7 9 11 12 14 15 16 17 18
But the length of the optimization path saved in res$opt.path only has 10 points, the ones proposed by MBO I guess.
> res$opt.path
Optimization path
Dimensions: x = 2/2, y = 2
Length: 10
Add x values transformed: FALSE
Error messages: TRUE. Errors: 0 / 10.
Exec times: TRUE. Range: 0.031 - 0.041. 0 NAs.
Since the function plotTuneMultiCritResult relies on the objects res$ind and res$opt.path to print the front, it shows weird results.
I think that the correct way to go is to copy the optimization path of the object res$mbo.result$opt.path into res$opt.path, but my question is: What's the point of having different optimization paths in res$opt.path and res$mbo.result$opt.path?
Thanks!!
VĂ­ctor
Using mlr_2.13 and mlrMBO_1.1.3 and the following code everything works like expected. I suggeset that you use the MBO Control object to specify how much iterations your optimization should have. Otherwise a default (4*d evaluations for the initial design + 10 iterations) will be used.
set.seed(1)
library(mlr)
library(mlrMBO)
# multi-criteria optimization of (tpr, fpr) with MBO
lrn = makeLearner("classif.ksvm")
rdesc = makeResampleDesc("Holdout")
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
mbo.ctrl = makeMBOControl(n.objectives = 2)
mbo.ctrl = setMBOControlTermination(mbo.ctrl, iters = 20)
ctrl = makeTuneMultiCritControlMBO(n.objectives = 2)
res = tuneParamsMultiCrit(lrn, sonar.task, rdesc, par.set = ps,
measures = list(tpr, fpr), control = ctrl)
plotTuneMultiCritResult(res = res, path = FALSE) # path = FALSE would only shows the Pareto Front

Recall from nltk.metrics.score returning None

I'm trying to calculate the precision and recall using the nltk.metrics.score (http://www.nltk.org/_modules/nltk/metrics/scores.html) with my NLTK.NaiveBayesClassifier.
However, I stumble upon the error:
"unsupported operand type(s) for +: 'int' and 'NoneType".
which I suspect is from my 10-fold cross-validation where in some reference sets, there are zero negative (the data set is a bit imbalanced where 87% of it is positive).
According to nltk.metrics.score,
def precision(reference, test):
"Given a set of reference values and a set of test values, return
the fraction of test values that appear in the reference set.
In particular, return card(``reference`` intersection
``test``)/card(``test``).
If ``test`` is empty, then return None."
It seems that some of my 10-fold set is returning recall as None since there are no Negative in the reference set. Any idea on how to approach this problem?
My full code is as follow:
trainfeats = negfeats + posfeats
n = 10 # 5-fold cross-validation
subset_size = len(trainfeats) // n
accuracy = []
pos_precision = []
pos_recall = []
neg_precision = []
neg_recall = []
pos_fmeasure = []
neg_fmeasure = []
cv_count = 1
for i in range(n):
testing_this_round = trainfeats[i*subset_size:][:subset_size]
training_this_round = trainfeats[:i*subset_size] + trainfeats[(i+1)*subset_size:]
classifier = NaiveBayesClassifier.train(training_this_round)
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(testing_this_round):
refsets[label].add(i)
observed = classifier.classify(feats)
testsets[observed].add(i)
cv_accuracy = nltk.classify.util.accuracy(classifier, testing_this_round)
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
cv_pos_recall = recall(refsets['Positive'], testsets['Positive'])
cv_pos_fmeasure = f_measure(refsets['Positive'], testsets['Positive'])
cv_neg_precision = precision(refsets['Negative'], testsets['Negative'])
cv_neg_recall = recall(refsets['Negative'], testsets['Negative'])
cv_neg_fmeasure = f_measure(refsets['Negative'], testsets['Negative'])
accuracy.append(cv_accuracy)
pos_precision.append(cv_pos_precision)
pos_recall.append(cv_pos_recall)
neg_precision.append(cv_neg_precision)
neg_recall.append(cv_neg_recall)
pos_fmeasure.append(cv_pos_fmeasure)
neg_fmeasure.append(cv_neg_fmeasure)
cv_count += 1
print('---------------------------------------')
print('N-FOLD CROSS VALIDATION RESULT ' + '(' + 'Naive Bayes' + ')')
print('---------------------------------------')
print('accuracy:', sum(accuracy) / n)
print('precision', (sum(pos_precision)/n + sum(neg_precision)/n) / 2)
print('recall', (sum(pos_recall)/n + sum(neg_recall)/n) / 2)
print('f-measure', (sum(pos_fmeasure)/n + sum(neg_fmeasure)/n) / 2)
print('')
Perhaps not the most elegant, but guess the most simple fix would be setting it to 0 and the actual value if not None, e.g.:
cv_pos_precision = 0
if precision(refsets['Positive'], testsets['Positive']):
cv_pos_precision = precision(refsets['Positive'], testsets['Positive'])
And for the others as well, of course.

Extract filenames and ext. from cell array of strings

I want to remove the directory part from a cell array of strings with filenames. Of course one way would be to loop over the cell arrray and use fileparts but I have over 1e5 files and speed really matters.
My current approach is:
fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"}
filenames = cellfun (#(fn, s) fn(s+1:end), fns,
num2cell (rindex (fns, filesep())),
"UniformOutput", false)
which gives the desired output:
fns =
{
[1,1] = /usr/local/foo.lib
[1,2] = ~/baz.m
[1,3] = home/rms/eula.txt
[1,4] = bar.m
}
filenames =
{
[1,1] = foo.lib
[1,2] = baz.m
[1,3] = eula.txt
[1,4] = bar.m
}
and takes approx 2e-5s per file. Is there a better (faster, more readable) way to do this?
EDIT I've added Sardars solution and my previous attempt with regex and some benchmark results:
fns = {"/usr/local/foo.lib", "~/baz.m", "home/rms/eula.txt", "bar.m"};
fns = repmat (fns, 1, 1e4);
tic
f1 = cellfun (#(fn, s) fn(s+1:end), fns,
num2cell (rindex (fns, "/")),
"UniformOutput", false);
toc
tic
[~, ~, ~, M] = regexp (fns, "[^\/]+$", "lineanchors");
f2 = cell2mat (M);
toc
tic
## Asnwer from Sardar Usama
f3 = regexprep(fns, '.*/', '');
toc
assert (f1, f2)
assert (f1, f3)
which gives
Elapsed time is 0.729995 seconds. (Original code with cellfun)
Elapsed time is 0.67545 seconds. (using regexp)
Elapsed time is 0.230487 seconds. (using regexprep)
Use regexprep to search the strings till the last / and replace the occurrences with an empty string.
filenames = regexprep(fns, '.*/', '');

Why the features are overwritten when extracted using pycaffe?

I observed that the features are overwritten when I extract them using pycaffe. My code is as follows:
tImg_1 = misc.imread('1.jpg')
tImg_1 = tImg_1[:,:,::-1] # color channel swap
tImg_2 = misc.imread('2.jpg')
tImg_2 = tImg_2[:,:,::-1] # color channel swap
tImg_1 = (np.float32(tImg_1)- 127.5)/128 # mean substruction
tImg_2 = (np.float32(tImg_2)- 127.5)/128 # mean substruction
tI_1 = np.moveaxis(tImg_1, 0, 1) # Transpose
tI_2 = np.moveaxis(tImg_2, 0, 1) # Transpose
# Extract features
tI_1 = np.reshape(tImg_1, (1, tImg_1.shape[2], tImg_1.shape[0], tImg_1.shape[1]))
tI_2 = np.reshape(tImg_2, (1, tImg_2.shape[2], tImg_2.shape[0], tImg_2.shape[1]))
net.blobs['data'].data[...] = tI_1
net.forward()
fts_1 = net.blobs['fc5'].data
print(fts_1[0, 0])
net.blobs['data'].data[...] = tI_2
net.forward()
fts_2 = net.blobs['fc5'].data
print(fts_2[0, 0])
print(fts_1[0, 0])
Executing this provides the following output:
0.508398
-0.176945
-0.176945
That means the values of fts_1 is overwritten by fts_2. How can I avoid this problem?
fts_1 is just pointing to net.blobs['fc5'].data. You need to make a deepcopy of the object. So your first assignment should be fts_1 = copy.deepcopy(net.blobs['fc5'].data)